..No justification
..No multiple spaces, use TAB chars (^P^I)
..Bolding with ^P^B, Italics with ^P^Y

ZSDOS, Anatomy of an Operating System, Part II

                               by
Harold F. Bower, Major, US Army Signal Corps; BSEE, MSCIS, Ham 
(WA5JAY), avid homebuilder (starting with 8008 running SCELBAL).
                               and
Cameron W. Cotrill, Vice President, Advanced Multiware Systems; 
specialist in "impossible" real-time hardware and software 
systems.


In the first part of this article, we presented the philosophy 
and the features of ZSDOS (Z-System Disk Operating System). In 
this portion, we will summarize the performance of ZSDOS, share a 
few of the tricks we used to shoehorn all these features into 7 
bytes, and give a few programming examples showing how to use 
some of the new features of ZSDOS and ZDDOS.

ZSDOS Performance.

Measuring the performance improvements of ZSDOS is a complicated 
matter. During development, an entire suite of tests was run on 
ZS/ZDDOS in various configurations in an attempt to validate the 
design tradeoffs. The most revealing tests of BDOS differences 
turned out to be a series of assemblies done under control of a 
command script. This should be no surprise as assemblies are by 
nature disk intensive. 

To reduce the perception that our results are "tailored" or 
skewed in favor of a particular system or configuration, 
different processor chips (Z80 and HD64180), different BIOSes 
(MicroMint, XBIOS, Ampro), and different media (RAM disk, Hard 
Disk and Floppy disk) were used in the timed runs. Since the 
results were most affected by the media, results are shown in the 
categories of RAM, Hard Disk and Floppy Disk performance. No form 
of file date stamping was done since ZSDOS would have a distinct 
advantage in this field.

Three sets of hardware were used in these analyses in an attempt 
to minimize the effect of any unique processes in a given system 
from skewing the results. The first system (System 1 in the 
timing runs) was a "stock" MicroMint SB-180 operating at a 6.144 
MHz clock speed. System 2 was an Ampro Little Board 1A with a Z80 
running at 4.0 MHz, and System 3 was a homebrew Z-180 system 
designed to be compatible with the SB-180 operating at 9.216 MHz. 
Complete information on each system in the Appendix.


OPERATING  SYSTEMS.

CP/M 2.2. Gary Kildall and Digital Research developed this 
operating system for 8-bit processors in an evolutionary process 
on early 8080-based computers. A subsequent product, CP/M Plus 
(also known as CP/M 3) is still in limited use, but has not gained the wide acceptance of the earlier release. CP/M 2.2 is 
coded in 8080 assembly language and is a non-banked, non-
reentrant single-user, single tasking operating system.

ZRDOS 1.9. Echelon Incorporated released many versions of this 
CP/M 2.2-compatible operating system over the past several years. 
It is coded in Z80 assembly language and will therefore not 
execute on 8080 processors. Some additional features were added, 
such as one-level reentrancy under user control, and return of 
the current DMA address. Later versions (after 1.5) include 
enhanced support for hard disk media by not rebuilding the 
allocation bit map on a disk relog command. Version 1.9 added 
larger disk and file sizes. Like CP/M, it is single-user and 
single-tasking.

ZSDOS. This is the topic of this article, with details and 
descriptions of features contained in Part I. ZSDOS is coded in 
Z80  assembly language and is also a single-user,  single-tasking 
operating system capable of single-level reentrancy.

Since this report was an aimed at formalizing an evaluation of 
the performance characteristics of ZSDOS, a number of different 
variants to the above operating systems were initially timed. 
Because the performance of these systems was very similar to 
others in the test, their comparative results are simply 
summarized below.

CP/M 2.2 with Plu*Perfect Systems' PUBlic patch. Only minor 
differences in performance from the basic CP/M 2.2 were noted, so 
results of the patched system were not included in the final 
results.

ZRDOS 1.2. The performance of ZRDOS 1.2 was very close to CP/M 
2.2, being a couple of percent slower in the majority of cases. 
It was therefore not included in the final timing analyses.

ZRDOS 1.7. Timing tests indicate no significant performance 
differences between ZRDOS 1.7 and 1.9.

ZDDOS. Since ZSDOS and ZDDOS are largely the same code and since 
comparative timings between them show less than a 1% difference, 
only times for ZSDOS will be presented. 


BASIC  IO SYSTEMS (BIOSes).

MICRO MINT, SB-180. While MicroMint currently ships Version 3.2 
with their systems, a slightly modified version of 2.7 was used 
in these timings on the SB-180. The changes included independent 
step rates for floppy drives, different floppy formats and fixing 
of eight-inch drivers as well as a slight amount of optimization. 
Little performance difference from the standard BIOS should be 
noticed. A 54k system size was used. The BIOS uses programmed IO 
on most peripherals with DMA functions of the 64180 processor 
used for Floppy and RAM disk data movement.

XBIOS, SB-180. XSystems' XBIOS version 1.1 is an extremely 
powerful and flexible banked system with excellent tools and 
interfaces. Malcom Kemp has concentrated on providing functions 
in this release, and has deferred optimization to future 
releases. XBIOS fully supports the ETS180 IO+ board, allows 
complete configuration of peripherals, and provides a larger TPA 
since only a small kernel resides in the primary memory area. 
Most of the BIOS code resides in an alternate memory bank. XBIOS 
installs the largest possible TPA when used which was 57.5k for 
these tests. XBIOS was installed with three buffers for disk IO.

AMPRO, Little Board-1A. A stock version of the Ampro version 3.8 
BIOS assembled with no ZCPR support was used for testing. A 
system size of 59k was chosen to provide support for 5 hard disk 
partitions spread over two physical drives. NZCOM was then loaded 
to provide Z-System support. The Ampro BIOS is strictly a polled 
system and uses no interrupts or DMA.


EVALUATION  PROCEDURES.

Since the goal of evaluating performance was to heavily exercise 
BDOS functions, a set of fourteen assembly modules, thirteen of 
which were 2-4k in size, and one of 6k were assembled to produce 
Microsoft REL files. To restrict external influences, no file 
date stamping was used, and many ZSDOS features such as Public 
and Path were disabled. On the other hand, to provide a semi-
realistic setting, ZEX.COM and the executable assemblers were 
placed in a different Drive/User with the ZCPR search path set to 
locate the files on the second directory scan. SLR's SLR180 
assembler was used on system 2, while tests on systems 1 and 3 
used Z80ASM+. Assembly was done under the control of a memory-
based SUBMIT utility (ZEX Version 3.1A) script file. Times were 
measured from the carriage return terminating the command 
invoking the ZEX file to display of the "Done" message after 
assembly of the last file. After each run, the .REL files 
produced by the assembly were erased so that the same disk space 
could be used in the next run. No other files were added or 
deleted to any media during the timing runs. At least three runs 
were performed for each configuration, and the results averaged. 
Timing was manually performed with a stopwatch.

Due to the radical differences in access times for different 
media, three categories of times were considered; RAM disk, Hard 
Disk, and Floppy disk. If you think you know how each system 
fared, read on - there may be a twist or two in the plot. 

RAM DISK. The Ampro has no RAM disk, so timings in this category 
reflect only the SB180. The SB180 computer is equipped with 256k 
of memory. The standard MicroMint BIOS divides this into a 64k 
main memory area and a 192k RAM disk. With XBIOS as tested here, 
64k is allocated for the main memory, 24k for the banked portion 
of XBIOS, buffers and banked system extensions. The remaining 
space is available for a RAM disk. RAM disks on the SB180 use built-in DMA capabilities of the HD64180 processor to move 
"sectors" of data rather than the slower block move instructions 
used by Z80 systems.

Exiting a program via the Warm Boot vector in CP/M relogs the A 
drive. To minimize time penalties imposed by this, a Hard disk 
partition was defined as the A drive. Needed programs as well as 
the assembly modules were placed on the RAM disk (M:), with 
ZEX.COM and Z80ASM+.COM placed in User 15 and the sources files 
in User 0. The search path for this phase was: Drive M, User 0 to 
Drive M, User 15.

Since the RAM disk is defined as a non-removable media in the 
Disk Parameter Block, the "Rapid Relog" feature of ZSDOS and 
ZRDOS was expected to produce much shorter execution times than 
CP/M for this series of measurements. As can be seen from the 
results, this was indeed the case. The raw timings in seconds 
with percentage changes from the shortest time are:

		ZSDOS		 ZRDOS 1.9	 CP/M 2.2
	    +------------------------------------------------+
 BIOS 2.7   | 17.0 (---)	17.1 (+4%)	36.4 (+114%) |
 XBIOS 1.1  | 14.2 (---)	14.5 (+2%)	34.5 (+144%) |
	    +------------------------------------------------+

The effects of the Rapid Relog feature were borne out, with ZSDOS 
being a couple of percent faster. Disabling the Rapid Relog 
feature of ZSDOS produced nearly identical results to CP/M, so 
most of the additional time for that system may be attributed to 
rebuilding the disk allocation bit maps for Drives A and M on 
each warm boot.


HARD DISK.

Three systems, 6.144 MHz SB-180 (System 1), 4.0 MHz Ampro Little 
Board-1A (System 2), 9.216 MHz Z-180 Homebrew SB-180 (System 3), 
were used to gather information for this phase. This latter 
system was added to demonstrate performance on a heavily loaded 
system.

	          ZSDOS		  ZRDOS 1.9	  CP/M 2.2
	     +------------------------------------------------+
 1-BIOS 2.7  | 0:54.7 (---)	1:16.6 (+40%)	1:34.7 (+73%) |
 1-XBIOS 1.1 | 0:52.2 (---)	1:15.4 (+44%)	1:33.4 (+79%) |
 2-AMPRO     | 1:55   (---)	2:44   (+43%)	3:15   (+70%) |
 3-BIOS 2.7  | 1:07.7 (---)	1:40.6 (+49%)	1:50.2 (+63%) |
 3-XBIOS 1.1 | 1:29.5 (---)	2:06.4 (+41%)	2:11.3 (+47%) |
	     +------------------------------------------------+

As in the previous RAM Disk results, the results of ZSDOS with 
"Rapid Relog" disabled and CP/M were nearly the same confirming 
that rebuilding the allocation bit maps on a disk relog is the 
principle cause for the increased CP/M times.

All reported times were made with a path which forced a search of 
the current directory before locating executable files on the 
second path element. As an experiment, the path on the Ampro 
system was changed to go directly to A2:, eliminating the current 
directory scan. All DOSes showed an identical 10 second speedup, 
indicating directory scan time for all DOSes was the same.

A further point to note is the effect of multiple disk buffers on 
performance. For system 1, the number of buffers was adequate to 
retain directory information which improved performance over the 
single-buffer Micromint BIOS by 1-5%. In system 3, the buffering 
was inadequate to retain necessary information, so the multiple 
buffers were of no benefit.


FLOPPY DISK.

Examination of system performance on a Floppy Disk system was 
tailored to duplicate, as closely as possible, a hypothetical 
operating configuration using multiple drives with non-trivial 
search path along differing Drives and User area lines.

Since all three primary operating systems of interest to this 
analysis (ZSDOS, CP/M 2.2 and ZRDOS 1.9) rebuild removable-media 
disk allocation maps on a relog, there was no need to explicitly 
disable the "Rapid Relog" feature of ZSDOS for this portion of 
the study. Results are:

		ZSDOS		 ZRDOS 1.9	  CP/M 2.2
	      +----------------------------------------------+
 BIOS 2.3     | 2:18.7 (+2%)	2:22.4 (+5%)	2:16.0 (---) |
 XBIOS 1.0    | 2:29.5 (+0.5%)	2:32.7 (+3%)	2:29.0 (---) |
 AMPRO        | 2:26   (+1%)	2:28   (+2%)	2:25   (---) |
	      +----------------------------------------------+

Since all of the operating systems are functionally identical in a 
Floppy Disk configuration, we did not expect large differences in 
measured times. We were therefore not surprised with variations 
over a spread of only five percent. While we strove to make ZSDOS 
as efficient as possible, CP/M was still the champ on floppy 
systems by a nose.

As a final comparison test between the three DOSes, the amount of 
time WordStar 4 took to ^QC and ^QR through the 92k ZSDOS source 
file was measured under all three DOSes. All timings were within 
1%, indicating that read/write to open file times were similar.


PERFORMANCE CONCLUSIONS.

ZSDOS offers significant improvements in system performance on 
CP/M 2.2 compatible Z80-compatible computer systems with fixed 
media even under the restricted test conditions which disabled 
some of the most powerful features of ZSDOS. Even more impressive 
results may be obtained in a "tuned" installation with such features as Public files, and proper selection of the DOS search 
path (improvements of 9% on a hard disk system are typical).

The other major conclusion that can be drawn from this effort is 
that the selection of a BIOS tailored to the requirements is 
crucial to achieving optimum performance. The multiple buffering 
capability of XBIOS offers speed increases in systems where an 
adequate number of buffers exists, but degrades floppy-based and 
heavily loaded hard disk performance.

During the data gathering for this report, an anomaly was noted 
with respect to CP/M Plus (or P2DOS) stamps. System #1 was 
initialized for P2DOS stamps on the disk holding data files to 
quantify the differences. In all cases ZSDOS was affected less 
than one percent, yet ZRDOS increased to seven percent longer 
than ZSDOS on RAM disk, 20% longer on floppy and 144% longer on 
hard disk. CP/M 2.2 was similarly affected, but to a lesser 
degree, increasing times over ZSDOS to 115% on RAM disk, ten 
percent on floppy and 140% on hard disk. While neither ZRDOS nor 
CP/M 2.2 can manipulate this type of stamp, merely using a disk 
which is so prepared will result in slower processing.


HOW WE DID IT.

During the year or so that we pursued our independent paths in 
modifying H.A.J. Ten Brugge's excellent P2DOS alternative to CP/M 
2.2's BDOS, our approaches were somewhat diverse. While Cam's 
approach was directed at perfecting features, Hal's effort was 
directed at streamlining the code to create a "speed demon" 
operating system, and Carson concentrated on enhancing embedded 
Date Stamping. In mid-1987, Bridger Mitchell was instrumental in 
getting us to pool our resources and collaborate in a joint 
venture. The results have been more than worth it. In Part I, we 
described the functional enhancements and standards embodied in 
ZSDOS, and have just shown the performance improvements compared 
to CP/M 2.2 and ZRDOS 1.9. In our efforts to foster better code 
for our 8-bit systems, we would now like to describe how the task 
of adding features and decreasing execution time was accomplished 
without increasing the Operating System memory requirements.

The topic of code optimization is a controversial one. In the 
early days of computers, programmers were saddled with small 
memory space and slow processors, so every effort was made to 
optimize programs for speed and size. As memory became cheaper 
and processors emerged with ever increasing clock speeds, 
programming techniques became lost to all but a few. This same 
path of evolution has also been followed in the Personal Computer 
field.

To demonstrate this point, first compare the 3.5 kbyte CP/M 2.2 
BDOS and the 1 kbyte Plu*Perfect DateStamper to the functionally 
superior 3.5k ZDDOS. Next, compare the 3.5 kbyte size of CP/M 2.2 
and ZSDOS to the 16 kbyte size of the functionally similar MS-DOS 
2.1. To carry the point further, contrast the almost 16 kbyte COMMAND.COM to the 7 kbyte size of a more capable ZCPR3 Command 
Processor with a full environment. Some of this bloat is 
understandable with the change in processor chips. On the other 
hand, the more powerful instructions of 16-bit 808x processors 
should have counteracted a good portion of this code bloat.

In line with the size comparisons, execution speeds also suffer 
with the larger code. Friends and co-workers who are used to 
working with PCs and clones operating at 4.77 and 8 MHz clock 
rates are constantly amazed at the speed of even a lowly 4 MHz 
ZSDOS system, and dazzled at the 6 and 9 MHz Hitachi 64180 
systems running the same software! While much of this is 
subjective, quite a bit is due to the fact that the "smaller" 8-
bit code has been hand-coded and optimized, whereas the PC arena 
is devoting more of its energy to coding in high-level languages. 
This makes sense under certain circumstances (e.g. during 
development and for long-term maintainability), but it most 
certainly does NOT make sense for operating systems where size 
and speed are of the essence.

Since all of our efforts have been directed at the Zilog Z80 and 
compatible family of microprocessors (including Hitachi's 64180 
and National's NSC800), the optimization steps covered here apply 
directly only to these. Having stated that, we also need to point 
out that many of the basic concepts will still apply to other 
processors, although details may differ.

No matter what processor is used, the goals of faster program 
execution and smaller memory size are in conflict. Smaller memory 
size normally means using each section of code as many times as 
possible - typically by using many subroutines. Faster code 
execution often means avoiding as many subroutine calls as 
possible. In every program undergoing optimization, the 
conflicting size and speed requirements must be balanced. This 
balance can be highly subjective. In ZSDOS, code size was the 
primary concern though significant effort was given to making the 
smaller code run as fast as possible. 

Now for the minutiae. If you are not a programmer, or are 
interested only in how to use ZSDOS, you might want to skip to 
PROGRAMMING FOR ZSDOS. For the diehards - here it is!

One of the first techniques we used in optimizing code was to 
examine all JUMP instructions. The basic instruction is three 
bytes long and executes in 10 clock cycles on a Z80. These 
absolute jumps may be unconditional (JP addr), or conditional (JP 
C,addr) based on the contents of the Carry, Zero or 
Parity/Overflow flags. The Z80 also features a two-byte Relative 
jump (JR) which also may be absolute (JR addr), or conditional 
(JR C,addr) based on the Carry or Zero flags. The relative jump 
is only two bytes long and may branch only to addresses within 
the range of +127 to -128 bytes of the jump instruction. While it 
is relatively easy to blindly change all jump instructions within 
range to Relative jumps, the careful programmer will also note 
that the Relative jump may carry a time penalty. The absolute relative jump, and conditional jumps where the condition is 
satisfied (the jump is taken) require 12 clock cycles compared to 
the long jump consuming only 10 cycles regardless of condition. 
On the other hand, conditional relative jumps need only 7 cycles 
if the condition is false. This type of optimization was one of 
the first used in our efforts to enhance P2DOS.

The next simple optimizing technique we used was to make maximum 
use of the Decrement-B and Jump Relative if Not Zero (DJNZ) 
instruction. This two-byte sequence executes in 8 or 13 clock 
cycles (B=0 and B<>0 respectively) for an absolute time and code 
saving over separate decrement/jump sequences. In some of our 
work on ZSDOS, using this instruction required redefining 
register usage to free up the B register for use as a counter.

Another simple optimizing step was examining the use of the IX 
register. IX holds the argument passed to DOS in the DE register 
(typically a file control block pointer). Despite having this 
value available all the time, there were a significant number of 
cases when faster and/or shorter code was produced by moving the 
pointer into HL. This was normally the case when the same offset 
within the FCB was accessed two or more times in succession.

The final "simple" optimization technique we used was to examine 
all PUSHes and POPs to the stack and delete any found to be 
unnecessary. While this sounds simple, it is quite a chore in a 
complex program such as ZSDOS where CALLs call other CALLs which 
call still other CALLs, etc. Each path must be examined to insure 
that the registers are, in fact, not altered or needed.

After the above "simple" optimizations were performed, A series 
of what we term "moderate" optimization steps were undertaken. 
One of these involved examining all series of sequential checks 
on a byte (such as the input command character scanner) and 
structure the check sequences to optimize performance based on 
clock cycle counting mentioned above, and estimated frequency of 
access for various commands. In the case of the command 
dispatcher, this technique resulted in extremely fast command 
parsing implemented with minimum code.

Sequential bit shifts and rotates are another area where more 
analysis is required before final code can be written. Sixteen-
bit shifts, and 8-bit shifts in registers other than the 
accumulator are areas where gains can be achieved. The usual 
method of using a subroutine which loads all bytes to the 
accumulator for shifts and rotates fares poorly if only one or 
two bit shifts are needed. While most of these cases had been 
removed from the P2DOS code by the original author, the 
replacement inline code still suffered from some inefficiencies. 
A two-bit shift right (division by 4) of the 16-bit HL register 
pair in the STDIR routine using the code:

	SRL	H	; Divide by 2
	RR	L
	SRL	H	; Divide by 4
	RR	L

proved optimum. Using a two-iteration loop with the DJNZ 
instruction around a single SRL H, RR L sequence would have 
produced the same 8-byte code length, but at a penalty of 21 
clock cycles. A call to a subroutine would have fared even worse 
with a 27 clock cycle CALL/RET penalty, and four bytes of 
overhead. On the other hand, three-bit shifts of the HL 
register pair occurred in a number of routines. These were 
consolidated into a single callable routine that uses the B 
register as a counter in an iterative loop with the sequence:

SHRHL3:	LD	B,3
SHRHLB:	SRL	H
	RR	L
	DJNZ	SHRHLB
	RET

While the replacement code added overhead, it saved 3-5 bytes  of 
code  (depending on entry point) which were sorely needed to  add 
additional features. ZSDOS calls this routine from three places, 
while  ZDDOS calls it from five. The difference is due to  ZSDOS 
"unrolling" the loop in time critical routines.

Shifts to the left were occasionally handled a little more 
efficiently by using the 16-bit ADD instructions of the HL 
register pair to perform bit shifts. An example of this appeared 
in the CALST routine. In this case, the DE register pair was 
rotated one bit to the left with sequential RL E, RL D 
instructions, with the Carry bit shifted into the HL register 
pair. Where the original code used the sequence: RL L, RL H to 
shift the bit into the HL pair, a two byte code savings was 
achieved with the single two-byte ADC HL,HL instruction.

Another area where considerable code and time savings were 
realized was in the consolidation of routines into "straight-
line" code. While this seems to be an anathema to structured 
programmers, it is often a must to obtain the performance 
improvements which we sought from our efforts. As a first step, 
all routines ending in Jump instructions were examined. Target 
addresses were then checked to insure that no other routine "fell 
through" to them. If it was in fact a "stand-alone" routine, it 
was moved to the end of the first routine so that the Jump could 
be deleted. An example of this is where the INITDR routine was 
moved to follow SELDK directly saving the two-byte relative jump 
and 12 clock cycles. Other cases involving long jumps saved three 
bytes and 10 clock cycles. A minor variation in relocation of 
code is to group functions to bring them within range of relative 
jumps thereby saving one byte at the expense of two clock cycles. 
This minor penalty in time often outweighed the value of a single 
byte of code in our efforts.

A variant on this concept involved examining sequences of code 
for duplicity, and combining identical sequences into new 
routines which "fall through" to the destination. This was amply used to define a new routine:

SRCT15:	LD	A,15
	CALL	SEARCH

This sequence was placed immediately before the TSTFCT routine, 
and replaced three occurrences of:

	LD	A,15
	CALL	SEARCH
	CALL	TSTFCT

with a single CALL to SRCT15. The overall effect of this one 
change was a savings of 10 bytes of code and 24 clock cycles for 
each of the three sequences replaced.

Detailed examination of code also produced unexpected savings by 
merely defining new labels. As an example, the last three 
instructions of the routine OPENEX were:

	LD	A,0FFH
	LD	(PEXIT),A
	RET

This sequence occurred two other times in the original code, and 
three times in the latest version of ZSDOS. The last two 
instructions were repeated in many locations, so one location was 
selected (centrally located to take advantage of relative jumps), 
with other instances accessing it with a call or jump to the new 
label, SAVEA. Setting the value to 0FFH in OPENEX was labeled as 
SETCFF, and the other two occurrences jumping to this location. 
While a small time penalty was incurred in jumping to this common 
code, the three byte savings was again needed to add features.

Our code "walk-throughs" and optimization efforts did not stop 
with the original code, but continued with every test version. 
First, we discovered a common "shell" of instructions around the 
DELETE, CSTAT, and RENAME functions and combined them with a net 
savings of 12 bytes. Later, a trick used in public-domain inline 
print routines to pass addresses on the processor's stack was 
used to recover five bytes of code by replacing three sequences 
of:

	LD	HL,(address)
	JR	COMCOD

with three 3-byte CALL COMCOD instructions. The trick involved in 
this change was to place the CALLs immediately in front of the 
routines whose addresses were to be passed to COMCOD. When 
executed, the CALL placed the routine address on the stack. A 
one-byte POP HL instruction at the beginning of COMCOD completed 
the change by placing the address in the desired HL register. 
Still later, the internal code in the COMCOD routine was again 
optimized to remove several memory references. This saved another 
four bytes.

Cameron's rewrite of the Console IO routines demonstrated another 
technique of reducing code size with very little overhead. The 
majority of affected code involved different DOS commands, yet 
exited through common code with absolute jumps. By PUSHing the 
exit address on the stack prior to jumping to the routines, a 
simple RETurn instruction sufficed to direct execution through 
the exit code saving two bytes per occurrence. The four bytes 
required to set the return address meant that the code size 
break-even point occurred at two instances. Since far more cases 
than that were involved, a significant code size reduction was 
realized. For DOS function calls, the time penalty incurred was 
21 clock cycles, however, that was not considered significant 
when dealing with the normal serial IO devices used in console 
functions.

A final noteworthy trick was added by Cameron which neither of us 
had ever seen documented in the Z80 world. It used the sixteen-
bit load instruction into the IX register (a four byte 
instruction) to "fall through" successive 16-bit loads to the 
primary registers. In this fashion, the sequence:

CMND27:	LD	HL,(ALV)
	JR	SAVHL

CMND24:	LD	HL,(LOGIN)
	JR	SAVHL

CMND31:	LD	HL,(IXP)
	JR	SAVHL

CMND47:	LD	HL,(DMA)
SAVHL:	LD	(PEXIT),HL
	RET

was replaced by a more efficient (in code size) construct. The 
bytes, as coded, are on the left, with the instructions seen by 
CMND27 shown on the right:

CMND27:	LD	HL,(ALV)	CMND27: LD	HL,(ALV)
	DEFB	0DDH			LD	IX,(LOGIN)
CMND24:	LD	HL,(LOGIN)
	DEFB	0DDH			LD	IX,(IXP)
CMND31:	LD	HL,(IXP)
	DEFB	0DDH			LD	IX,(DMA)
CMND47:	LD	HL,(DMA)
SAVHL:	LD	(PEXIT),HL		LD	(PEXIT),HL
	RET				RET

This code works because the IX register is not used in the 
remainder of the exit code, and the entry IX value is restored 
upon returns from ZSDOS functions. Each cascaded value saves one 
byte of code, but adds additional clock cycles to the execution 
time. Where the original code required a constant 28 clock cycles 
before arriving at the SAVHL routine, the new code execution time is different for each entry point. In this example, the time (in 
clock cycles) required for each entry point to arrive at SAVHL 
is:

	CMND47	- 16 cycles
	CMND31	- 20 + 16 = 36
	CMND24	- 20 + 20 + 16 = 56
	CMND27	- 20 + 20 + 20 + 16 = 76

At this point, an analysis of probable calling frequency was done 
to order the calls so that the most frequently used functions 
would incur the least penalty. The ordering shown here was judged 
to be the optimum sequence.

In a similar manner, eight-bit loads of the A register were 
consolidated at the beginning of the SEARCH routine. Our analyses 
of the code showed that SEARCH was called several times with 
values of 12 and 15 in the A register. Loading of these values 
was relocated to the beginning of SEARCH, then consolidated with 
another single-byte DEFB prefix. The resultant code as entered, 
and as seen by SEAR12 is:

SEAR12:	LD	A,12		SEAR12:	LD	A,12
	DEFB	21H			LD	HL,0F3EH
SEAR15:	LD	A,15
SEARCH:	...			SEARCH:	...

Instead of posing a time penalty as the LD IX,nn trick described 
above, this case saved one byte over a relative jump and two 
clock cycles (JR = 12 cycles, LD HL,nn = 10 cycles). As above, 
this worked because the HL register contents were "don't care" 
upon entry to the SEARCH routine.

These techniques are very powerful when code size is at a 
premium. Any sequence of code that loads a register or register 
pair then jumps or calls a common routine is a candidate for this 
technique. You need a register pair to throw away, but this is 
usually easy to find. 

The final case of optimization is the most difficult, and 
involved complete logic redesigns. This area is so specific and 
lengthy that it will not be covered here. As so often stated in 
textbooks, it is "left as an exercise for the reader" to examine 
the original P2DOS source and identify areas which can be 
redesigned. Much logic redesign was required as a part of the 
added ZSDOS and ZDDOS features, though the effort didn't stop 
there.

Just as important as what we did to gain speed and reduce size is 
what we didn't do. P2DOS originally used some self modifying code 
in the error printing routine. We decided from the outset that we 
would avoid this practice (tempting though it is..) in order to 
produce code that could be ROMed and/or run on the Z280 in 
protected mode. This decision cost us several bytes of code, but 
allowed us to accomplish our goals.

PROGRAMMING FOR ZSDOS.

ZSDOS places a few restrictions on systems which do not exist in 
other CP/M compatible operating systems. The most significant is 
that the BIOS MUST NOT DISTURB THE IX REGISTER. So far, the Epson 
QX-10 and Zorba computers have been identified as having BIOSes 
that corrupt this register. With NZCOM, we have developed a 
"protective" NZBIOS (look for ZSNZBI12.LBR on most Z-Nodes) that 
shields the Z80 registers from ill-behaved BIOSes, but operation 
without NZCOM on such systems will require that the BIOS be re-
written.

On this topic, we would like to propose that all programmers 
observe register usage more closely. The Z80 alternate and index 
registers belong to APPLICATION programs, and must be preserved 
by all operating system components. On the other hand, the "I" 
and "R" registers, as well as all new 64180 and Z280 registers 
(with the exception of the Z280's SSP) belong to the BIOS since 
they are hardware specific and directly I/O related. The Z280 SSP 
should be reserved for BDOS use.

Before trying to access any of the expanded ZSDOS features 
discussed in the last issue, you should first insure that the 
program is in fact executing under ZSDOS. This is a two-step 
procedure involving a call to check for CP/M 2.2, then a call to 
the ZSDOS Return Version function. By checking in this manner, 
your program will be able to identify CP/M 1, 2 and 3 (aka Plus) 
as well as ZSDOS, ZDDOS and ZRDOS. Code to accomplish this task 
is:
	LD	C,12		; Return CP/M Version
	CALL	0005		; ..via BDOS
	CP	30H		; Is it CP/M Plus?
	JR	NC,ISCPM3	; ..jump if so
	CP	20H		; Is it CP/M 1.x?
	JR	C,ISCPM1	; ..jump if so w/version # in A
	CP	22H		; Is it CP/M 2.2?
	JR	NZ,BADVER	; ..jump to unknown 2.x version
	LD	C,48		; Now make the extended call
	CALL	0005		; ..via BDOS
	LD	A,H		; Check the DOS type first
	CP	'D'		; Is it ZDDOS?
	JR	Z,ISZD		; ..jump if so, Ver # in L
	CP	'S'		; Is it ZSDOS?
	JR	Z,ISZS		; ..jump if so, Ver # in L
	OR	A		; Is it ZRDOS?
	JR	Z,ISZR		; ..jump if so, Ver # in L
	...			; Else can't identify, do error

Bridger Mitchell's Advanced CP/M column in TCJ #36 also provides 
sample code to perform this function. A slight variation on the 
above sequence is used in utilities provided with ZSDOS to enable 
them to work under a variety of different operating systems. We 
propose that this technique be used for any future Disk Operating 
systems by returning a different unique character in the "H" register.

Many programs in the past have relied on unpublished locations 
within the BDOS to alter the performance or functionality of the 
system. With ZSDOS, we provide published "standard" ways to 
dynamically tailor DOS parameters. The most important way of 
accomplishing this is with a set of configuration bits, or flags. 
To accommodate future expansion, a word value of sixteen bits is 
defined with only the lower seven used in the current 1.0 
release. The Flag bits used in ZSDOS 1.0 are:

	D D D D D D D D
	7 6 5 4 3 2 1 0
	 \ \ \ \ \ \ \ \_Public File Access
	  \ \ \ \ \ \ \__Public/Path Write
	   \ \ \ \ \ \___Read-Only Disk
	    \ \ \ \ \____Fast Fixed Disk Relog
	     \ \ \ \_____Disk Change Warning
	      \ \ \______BDOS Search Path	*
	       \ \_______Path w/o SYS Attribute	*
	        \________(Reserved)

The cited function is activated by setting the respective bit to 
a "1", and disabled by clearing the bit to a "0". Since ZDDOS has 
no search path capability, the features marked with an asterisk 
pertain only to the full ZSDOS configuration, and are "don't 
care" bits in ZDDOS. The bits will be returned as the lower byte 
in the 16-bit word field in the "L" register. Code for returning 
them is:

	LD	C,100		; Get the FLAGS bits
	CALL	0005		; ..with DOS call
	...			; "L" has present 7 bits

Likewise, the flags may be set from applications programs with 
Function 101 as:

	LD	DE,(FLAGS)	; 1.0 only recognizes byte in E
	LD	C,101		; Now set flags in ZSDOS
	CALL	0005		; ..with DOS call
	...			; New settings are now effective

Date and Time capabilities are just as easily accessed. The 6-
byte Clock data may be retrieved to a specified buffer with DOS 
Function 98 as:

	LD	DE,TIMEAD	; Address of 6-byte buffer
	LD	C,98
	CALL	0005		; Read Clock from DOS
	INC	A		; Any Errors? (FF --> 0)
	JR	Z,ERROR		; ..jump if error (no clock?)
	...			; Else use the retrieved time
TIMEAD:	DEFB	0,0,0,0,0,0	; Initialized Null DateSpec

With the File Date Stamping capabilities of ZSDOS, we developed a single standardized way of accessing individual file stamps. 
Function 102 will copy the set of stamps for a specified file to 
the current DMA address, while 103 will set the stamps for the 
specified file to the values at the current DMA address. Since 
all supported stamping methods (currently DateStamper(tm) and the 
CP/M Plus compatible P2DOS) feature the same format at the ZSDOS 
level, no user conversions are needed. Indeed, using special 
stamp drivers provided with the ZSDOS package, either stamp type 
may be read with both being written by Function 103 if the 
destination disk has been so prepared. A sample of code used to 
copy stamp data from one file to another is:

	LD	DE,DSBUF	; Point to 15-byte stamp buffer
	LD	C,26		; ..and set the DMA address
	CALL	0005
	LD	DE,SRCFCB	; Source FCB (User set already)
	LD	C,102		; Get the source's Stamps
	CALL	0005
	...			; Set User to destination?
	LD	DE,DSTFCB	; Destination FCB
	LD	C,103		; Write Stamps from DMA buffer
	CALL	0005		; ..to Dest file
	...


FINAL THOUGHTS.

ZSDOS was a labor of love. Though we didn't really start out to 
create such a significant step forward in 2.2 compatible BDOSes, 
it turned out that way. It is our hope that the ideas presented 
in ZSDOS will form the basis for the next generation of BDOS 
replacements. If nothing else, we hope that ZSDOS stimulates the 
Z80 compatible community to address the issues of standards for 
datestamping, enhanced error handling, and global file access.

The next step for an improved operating system will be to break 
the 64k barrier. Joe Wright and Jay Sage's efforts in dynamic 
system configuration with NZCOM are very useful, but fail to 
address the fundamental problem - we need to use the banked 
memory featured in most newer systems. Furthermore, this must be 
done in a way that allows existing applications to run properly. 
This means (unlike CP/M Plus) a BDOS that lets BIOS deblock, a 
BIOS jump table that is directly callable from all banks, system 
vectors at the normal locations, etc. This also means 
establishing standards for bank sizes and addresses, hardware and 
processor independence, and finally universal DOS level and BIOS 
level interfaces to banked memory. Other standards that will be 
needed by the next generation of OS's include banked RSX 
standards (though Bridger Mitchell and Malcom Kemp seem to have 
this nailed down), banked device driver standards, and expanded 
TCAPS and ENV definitions (aren't these properly BIOS structures 
folks?). Now is the time to come together, speak up on these 
matters, carefully weigh all alternatives, and make our wishes 
known.

Also, we urge the community to support those doing active 
development for our systems by purchasing legal copies of the 
software you use. This will allow and encourage development of 
things like a new, better, and faster banked systems with all the 
goodies we really want. We applaud the efforts of MicroPro in 
developing and releasing WordStar 4 for CP/M systems, and 
encourage other vendors to update their CP/M offerings in the 
fields of Database Management systems and Spreadsheets for the 
new generation of systems. Further, let's agree to agree on what 
we really want. In this manner, we can all concentrate our 
efforts on applications programs, not rewriting BDOS. In short, 
let's work together to create a computing environment that will 
turn the big blue clones green with envy.

In conclusion, what started as independent "labors of love" to 
produce a better operating system rapidly became identical 
obsessions as we reverted to counting clock cycles and bytes. We 
are satisfied with the results, and hope that others will benefit 
from our work and produce smaller, faster and more full-featured 
programs to help make our lives easier (and keep from emptying 
our wallets with requirements for constant upgrades). Finally, we 
must thank H.A.J. Ten Brugge for beginning this entire episode by 
releasing P2DOS. Without his efforts, none of us (Cam, Hal and 
Carson) would have been tempted into the area of operating system 
authorship, and would have left it to "others" to determine what 
we need in our respective systems.

APPENDIX: The hardware used in these analyses is:

System #1: MicroMint SB-180.

 Processor:	HD64180 operating at 6.144 MHz clock rate with
		No memory wait states and 2 IO wait states.
 Console:	Serial Console connected to ACSI port 1 at 19.2
		kbps, Interrupt-driven buffered keyboard input.
 Interfaces:	ETS180 IO+ providing SCSI interface and RTC.
 CCP:		ZCPR 3.3 with full environment.
 BIOS:		MicroMint 2.7 modified / XSystems XBIOS 1.1.
 Search Path:	$$:, A15: (Current Drive & User, then A15:)
 Hard Disk:	Syquest SQ-306R 5 Megabyte removeable-media,
		Interleave of 3, 12 microsecond buffered seek,
		Adaptec 4010 controller.
		A: 1576k of 2552k free, 94 files, 68 in User 15.
		B: 2432k of 2568k Free, 17 files, 16 in User 1.
 Floppy Disks:	A: NEC 80-track DSDD, 4 mS step, 4 mS Head Load,
		16k of 782k free, 93 files, 68 in User 15.
		C: Shugart SA465 80-track DSDD, 6mS step, 736k of
		782k Free, 17 files in User 1.

System #2: Ampro Little Board 1A.

 Processor:	Z80A operating at 4.0 MHz.
 Console:	Serial Console connected to DART port 1 at 9600
		baud, hardware handshake enabled.
 Interfaces:	SCSI daughter board with NCR 5830 driving 1610-4		controller.
 CCP:		ZCPR 3.4 with full environment.
 BIOS:		Ampro V3.8/NZCOM.
 Search Path:	$$:, A2:, A0: (Current Drive & User, then A2, A0:)
 Hard Disks:	Seagate ST-225 20 Megabyte, interleave of 2,
		200 microsecond buffered seek, Shugart 1610-4
		controller. A Shugart 5Mb full height drive was
		also connected to the controller, but was not
		used in the test.
		A: 2744k of 8160k free, 425 files, 77 in User 2.
		C: 984k of 4192k free, 258 files, 32 in User 3.
 Floppy Drives:	A: Teac 55F 80 track DSDD, 6 mS step, 10k of
		782k free, 74 files.
		B: Teac 55F 80 track DSDD, 6 mS step, 736k of
		782k free, 17 files in User 0.

System #3: Homebrew SB-180 compatible.

 Processor:	Z-180 operating at 9.216 MHz clock rate with
		No memory wait states and 3 IO wait states.
 Console:	Serial Console connected to ACSI port 1 at 19.2
		kbps, Interrupt-driven buffered keyboard input.
 Interfaces:	ETS180 IO+ providing SCSI interface and RTC.
 CCP:		ZCPR 3.0 with full environment.
 BIOS:		MicroMint 2.7 modified / XSystems XBIOS 1.1.
 Search Path:	A15: (ZCPR 3.0 searches current, then A15:)
 Hard Disk:	Shugart SA-712 10 Megabyte, Interleave of 1,
		12 microsecond buffered seek, Shugart 1610-3
		controller.
		A: 324k of 2552k free, 179 files, 101 in User 15.
		D: 252k of 2792k Free, 438 files, 16 in User 5.
